Written by J.P. Wares, Professor, University of Georgia, jpwares [at] uga.edu
This is shared by-nc-sa/4.0, I’m not writing it to be some polished final thing but something that shifts through new ideas and new people using or modifying parts of it. This text may be used following these guidelines: https://creativecommons.org/licenses/by-nc-sa/4.0/
This document is being developed for the cross-listed ECOL/GENE 4530/6530 at University of Georgia to improve the experiential nature of learning the methods necessary for the field of “molecular ecology”. Maintaining it as an .Rmd allows direct analytical opportunities (and familiarity with basic statistical coding approaches) and the ability to incorporate some simple simulation tools using Shiny. It will also let me update as needed in a straightforward way.
…
Surely you already have some thoughts on how turtles move, and what that could mean for the spatial distribution of diverse phenotypes and molecular diversity within their range? (photo: J. Wares)
I think… I think… I’m writing this in a way that more advanced students can skim the first few chapters and gain something from focusing on the latter ones; a more novice course might only get through the first several chapters and then just read appropriate-focused papers (e.g. in the undergrad/grad mix I expect to teach, the course will move at one pace but the grads at another and with a distinct type of guidance, starting to use more time later in semester to present on methods or case studies). I want to think about how to teach molecular ecology, not just about how to do it. It seems there has to be some coding expertise that comes into play at this point, and some experiential practice. So, I think this is what is going to work. I hope.
Build in individual 5-min talks on a recent paper in the journal Molecular Ecology or equivalent; your job is to do this by end of semester, just give me a heads-up a week or so in advance so I can schedule. You are basically giving a lightning talk with 1-2 figures and putting into context of what we are learning or have learned, and why you find it interesting. (10%)
“MONTE CARLO” papers will be assigned 3-4 times during semester to write a one-page essay (single spaced) on your questions related to class and your understanding about those topics and how to address them, with 1-2 references. When assigned they are due in 48 hours. (10%) [JPW see notes from Lindsey]
Analytical exercises are contract-graded, turn in on time for full credit so I can understand how the material is reaching everybody; when assigned they are due in 48 hours (20%)
Draft 1 proposal: 3 page proposal on a topic of interest to you in the field of molecular ecology, with citations, appropriate figures, methods, and background. Turning it in on time is 10% of your grade, this will be peer-reviewed (10%)
Doing peer review for another student is 10% of your grade
Draft 2 proposal: maybe reverse 1 and 2, this one graded/guided by me (10%)
Final proposal: (20%) with a budget of $5k inclusive of travel, consumables etc.
(Chapter 1: Overview of text)
(Chapter 2: Basics of genomic data)
(Chapter 3: Mutational diversity)
(Chapter 4: Types of spatial diversity)
(Chapter 5: Population models)
(Chapter 6: Adding in reality of landscapes)
(Chapter 7: Getting into selection etc.)
(Chapter 8: The phenotype and quantitative traits)
(Chapter 9: Parentage)
(Chapter 10: Intuition and surprises)
The first day of classes we will prep our computers for using R/RStudio for a major resource in this class. If at all possible, before the class begins you should install R:
and RStudio (free version):
https://rstudio.com/products/rstudio/
Please note the risk in all of this is that packages and versions of software are constantly changing, and sometimes code that has been working will stop (and vice-versa) because of these changes. Additionally, a key element of making this work - currently - is making sure that the path is set correctly so that this .Rmd file can find figures and code to interact with. I’m hoping I’ve set this up so that everything works from the directory you downloaded, but we will double-check today.
This is an R Markdown document, with Shiny apps built in. At this point in time, the Shiny apps are all written by the talented Dr. Silas Tittes and are available at https://github.com/silastittes/shiny_popgen.
What does that mean? Markdown is a simple formatting syntax for authoring HTML, PDF, and Microsoft Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. Adding the Shiny apps means that the document is interactive. The only downside is that it means that users must have R and RStudio installed, plus a few R packages, on their computer.
(OK, another downside is it is going to be difficult to read in a hammock. Or, at least, you should read something else if you are in a hammock.)
The upside is that it is more of a living document. It means that as data change, the output of analysis can change. It also means that R code can be built directly into the document so that you can see how some figures or modules are generated, and you can build on this knowledge. You can embed an R “code chunk” like this:
#library(learnPopGen)
#driftplot<-drift.selection(p0=0.25,Ne=500,w=c(1,1,1),ngen=400,nrep=5)
#that package is broken, so add in something better
I’m putting this text together using R Markdown in particular so that examples can be incorporated that students can then work with to try and understand how varying the input information affects our expectations about the molecular data used to answer ecological questions. For example, the code chunk above - and the figure it produces - not only illustrates genetic drift (here, an example where one of 2 alleles is initiated in a population at a frequency of 0.25, and the “effective population size” about which we will learn more, is 500; there are 5 replicate simulations - in fact you should notice that the figure is distinct every time you run this document!), but actually provides the code for the illustration that can be modified as knowledge of the process becomes more advanced (when looking at the R Markdown code document itself in RStudio, if you hit the green ‘play’ button in the upper right corner of the “code chunk” you can do the simulation over and over, and you can look at the code and probably figure out quickly how to change the parameters it runs under).
By organizing this material the way I think it may come across to beginning students in the field, I hope to avoid the personal puzzle of when I initially shifted this class from 8000-level (intermediate grad course) to 4000-level (advanced undergrads with fewer prerequisites) by clarifying these probabilistic processes with illustrations based on simulations that the students can themselves repeat.
Because I’m using Shiny code, you cannot “Knit” this document into a static form, but instead will hit “Run Document” (up near the top of the RStudio screen) once it is loaded into R and that will generate a browser text that is dynamic in some places to let you run the simulations. It is a work in progress, but for now it does mean that to work with this you must have access to a computer that will run R and RStudio, at a minimum.
This will be less a textbook that you read for complete comprehension, and more something you read to generate questions that we discuss; I am trying to “flip the classroom” and organize for future classes at the same time. Some aspects will need to be explained in class or using diverse media to make sense. When I was a sophomore learning cell biology, I know that I tested well on the subject but in the end, had zero clue what gel electrophoresis meant until I did it on a daily basis. (super basic intro to “electrophoresis”: https://www.youtube.com/watch?v=ZDZUAleWX78) So, your job is to ask questions! That way, we learn more completely not just from me, but from each other and from our inquiry.
For other users of this document, please note: I use a lot of examples I am familiar with, meaning they are often projects I’m an author on, or was on that person’s committee, or whatever. There is so much other fantastic science out there, this isn’t about ego though: it’s just my ability to immediately dig deeper with those as examples not just of how the science can be done, but also about when it could have been done better. Also, I’m a marine ecologist; when I talk about plants, it is basically because of great colleagues who have entranced me with their weird and important terrestrial photosynthetic life, and mammals and fish similarly: I credit cool colleagues who have brought me into the fold. If you end up using this as an instructor, I encourage you to think about including your own, even better, examples.
Plus, I can imagine now I can just look in here for some of the papers I want others to refer to, you know - what was that paper I cited about XXXXXX? Oh yeah it was in chapter 4…
The phrase ‘molecular ecology’ is nothing new; it is in many ways synonymous with ‘ecological genetics’ as first applied by pioneers like Dobzhansky, Ford, and others (en.wikipedia.org/wiki/Molecular_ecology). Maybe the question of terminology comes down to people who identify as geneticists but want to solve ecological concerns (Phil Hedrick and his work on Florida panthers in 1995? I’ve not met him to verify how he identifies as a scientist), or people who identify as ecologists but recognize how to use genomic data as a means to greater understanding (can’t help my bias, I think of Rick Grosberg doing a number of deep explorations into behavioral ecology via understanding genome-wide kinship, see Fig 1.1). The journal Molecular Ecology (https://onlinelibrary.wiley.com/journal/1365294x) began in 1991; the field is not new, but the attention given to it from a broader spectrum of scientists seems to be. Before giving an overview of what may be included in this topic, it is probably first important to acknowledge that there are “molecular” approaches to addressing ecological questions that are often not included in this field.
Fig.1.1 - A figure from Ayres & Grosberg (2005, doi:10.1016/j.anbehav.2004.08.022), title starting “Behind anemone lines…” about how anemones interact based on their relatedness - they do not have to have identical genomes to interact cooperatively, but have to share allelic diversity above a certain threshold at several genomic loci - otherwise they fight.
Ecosystem ecologists ask about elemental and nutrient cycles in the environment, and such work routinely screens for the abundance, provenance, and isotopic ratios of Carbon, Nitrogen, Phosphorus, and other key elements to life (https://en.wikipedia.org/wiki/Ecosystem_ecology). A good example might be the work of Dr. Krista Capps on invasive suckermouth catfish (family Loricariidae) in Mexico; these catfish have bony plates on their body that absorb tremendous amounts of phosphorus from the rivers they are in - limiting algal growth and thus indirectly harming the resources for native fishes. Certainly, a molecular component to ecology! Also, the chemical analysis of otoliths and gastropod shells (e.g. https://www.pnas.org/content/116/14/6878), or assessment of paleoenvironments via analysis of gas composition in ice cores or otherwise (https://www.wm.edu/news/stories/2019/for-chesapeake-oysters,-the-way-forward-leads-backback-through-the-fossil-record.php), are ‘molecular’ approaches to answering ecological questions.
However, this is where we return to that phrase ‘ecological genetics’, which puts our field squarely in connection with how heritable information - DNA, RNA, and proteins - can be studied to evaluate the relationships of organisms as a means of considering migration, isolation, population demography, mating and kinship analysis, and more. These questions can only be addressed because of evolution of the molecules in question. Mutations occur and may be passed on through reproduction; as mutations become common in a population, they become the basis of the markers we track to address such questions using population genetic understanding of evolutionary mechanisms such as drift, non-random mating, selection, and migration.
In particular, we may need to know this information to bridge the gap between studies of quantitative trait diversity and how traits affect an organismal response to a changing environment. Molecular data will not, as we will examine, tend to replace detailed studies of quantitative genetics, reaction norms, or similar evaluations of how particular genotypes perform in particular environments. Instead, these molecular data - all derived from the genomes carried around by the organisms we study - give us insight into all of the evolutionary mechanisms that allow inference of how the organisms move naturally, and how genes within their genomes respond distinctly across environmental gradients. It will also give us some ideas to improve the design of quantitative or comparative studies of natural biodiversity.
Thus, this text will follow some basic outlines that you may find in other books like Joanna Freeland’s Molecular Ecology 3rd ed (2020) or Matt Hahn’s Molecular Population Genetics (2019), excellent resources in distinct ways - however, since I often work across many resources in an attempt to save students some money, and each of these texts is aimed at a slightly distinct target audience, this is going to give us the basic framework for exploring heritable molecular diversity in a way that keeps the focus primarily on the ecological questions and contemporary ways to make inference from DNA, RNA, or comparable data. Also, I am going to deviate from typical texts in this field in one way in particular - I won’t be delving as much into the historical development of the field, which has often served as the organizing framework for many such books, e.g. as markers advance our analyses have advanced. I’m going to argue that is not true; we are actually using fairly traditional population genetic analyses these days with more data, and better data; the periods of using other methods (e.g. the heyday of “phylogeography”) were actually being used as proxies for population genetic theory (Templeton, Avise).
Finally, I’ll note something I’m trying out in terms of verbiage. For a long time, people have talked about population genetics and conservation genetics and ecological genetics. However, with part of my appointment being in a Department of Genetics, I can see that for the most part we are not asking questions about how diversity is inherited or the cellular processes that interact as much as we are about how the diversity across a genome (or portions of it), and how it is distributed, indicate the evolutionary and ecological processes acting on it. This applies to early work in Drosophila pseudoobscura and chromosome rearrangements straight through to modern RAD-seq approaches and whole-genome resequencing. The distinction between “genetics” and “genomics” is not, to me, about the precise number of markers you are studying but in the intent of analysis. I may not want to know anything about the identity of a gene that is an outlier in terms of cytonuclear disequilibria, because I don’t want to resolve how a nuclear gene and a mitochondrial gene are interacting. That is for somebody else to do! The fact that they interact gives each of them special identity in helping us see patterns that are driven by the environment and interactions with the environment, and so the patterns are for us. It’s a distinction that is open for discussion of course.
Chapter 2 will deal with how molecular markers are generated (What are molecular markers - extending to diploid and to cost-effective ways to explore genomes, what do they cost in time and money, and what sampling guidelines should we consider? Some elements of sampling won’t make sense until we get into the types of inference and analysis used with particular questions, so in some places these will be left as questions for us to return to), and how they can be applied using barcoding, environmental DNA, and community ordination to understand distribution and abundance.
Chapter 3 will provide additional grounding in how mutations and diversity are generated - pretty key, especially for students with less exposure to introductory genetics coursework.
Chapter 4 is about the basic elements of alpha and beta diversity – that is, the diversity at a single location and the difference in diversity across locations. As ecology is often focused on the distribution and abundance, these approaches let us more accurately define the prevalence of certain subsets of biodiversity so we can more accurately assign locations to distinct communities or systems. The ‘space for time’ argument applies both ecologically and genetically through the process of drift at a minimum (Hubbell, Vellend) so that we expect different locations to have different diversity in part because they are demographically independent; of course migration (and gene flow) will affect this and that is one of our major goals to understand in this field.
Chapter 5 and 6 deal with basic evolutionary mechanisms and what they can do to molecular diversity; Generalizable population models and how to tell when the data indicate a more complex model, e.g. HWE and the coalescent; an overview on finite population size: Ne and all the distinct ways it is measured, WHY IT AFFECTS DIVERGENCE RATE, and all the distinct ways it is only kind-of useful, from Hare et al 2011 (and Waples before him) to human evolution and even taking a swing at Turner et al 2002 and Alo et al 2004 (which is more likely correct given the distinctions). We will talk about population models, mutation accumulation and biogeography to get at \(\mu\), basic info on movement in the sea based on genomic diversity and so on, what we know of recombination, and this all lets us get to …
Chapter 7 where we deal somewhat with how knowing this baseline information helps us think about what selection does across distinct environments. This is often a target for research, but it takes so much baseline information to really understand outlier molecular diversity.
That sets us up for Chapter 8 which gets into quantitative genetics and the association with genomic data, because what we know of selection is that many traits are super polygenic. We will discuss and work with RNA expression data, learn a bit about epigenomic markers as well, and discuss ‘keystone loci’ that have effects on the ‘extended phenotype’ of populations.
Chapter 9 will give us time to explore mating and behavior - collective as well as individual.
Chapter 10 deals with where the field is going and spends some time focusing on the ‘natural history awareness’ of the analyses we have learned; often key insights come from seeing how data behave or misbehave given your preconceptions.
N.B. I am aiming this at upper-level undergrads in ecology or evolutionary biology who may have had some introductory genetics or evolution; but, I am going to do my best to not assume you remember everything from those classes. Beginning grad students will also benefit, but should be encouraged to lead the class in paper discussions or experiential workshops to help build their depth of understanding.
Also, with this being written in the work-at-home era of COVID, some references will be scant pointers to the actual resource and I hope you will forgive me when I know who I’m pulling from but can’t find it right away.
Week 1 reading:
Travis (2020): https://www.journals.uchicago.edu/doi/pdfplus/10.1086/708765
Marmeisse et al. 2013 https://nph.onlinelibrary.wiley.com/doi/epdf/10.1111/nph.12205 to think about what molecular tools can tell us about diversity and ecosystem function
Govindarajan et al. 2015 https://peerj.com/articles/926/# to consider how divergence of populations (tree thinking) illuminates divergence of function, tolerances, or interactions; distribution and dispersal; and first look at summary statistics in molecular ecology in terms of barcode gaps/distinctions
Fig.1.2 - A pleco caught in the Chacamax River in Chiapas, Mexico - photo credit Krista Capps.
To wrap this up, a photo of one of the invasive Loricariid catfish mentioned earlier. More info on the system can be found at https://news.cornell.edu/stories/2013/08/freeing-pet-catfish-can-devastate-ecosystems . Can you think of how studying genomic diversity of these catfish - as well as the source populations from which they come - could be useful?
Resources cited in this section - I will typically cite in-line actually
Avise 2000
Ayres & Grosberg (2005, doi:10.1016/j.anbehav.2004.08.022)
Freeland, J. 2020.
Hahn, M.W. 2019.
Hedrick, P.W. 1995.
Templeton, A.R. (NCA era)